AITopics | fake data

Collaborating Authors

fake data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

53dbd7e34fab703a639964e2d3ee9e84-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 10:55:38 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

A Review on Domain Adaption and Generative Adversarial Networks(GANs)

Dhawan, Aashish, Mudgal, Divyanshu

arXiv.org Artificial IntelligenceOct-15-2025

In a field of study lik e image classification, where data is of utmost importance, we need to find more reliable methods which can overcome the scarcity of data to produce results comparable to previous benchmark results. In most cases, obtaining labeled data is very difficult b ecause of high cost of human labor and in some cases impossible. The purpose of this paper is to discuss about Domain Adaption and various methods to implement it. The main idea is to use a model trained on a particular dataset to predict on data from a di fferent domain of the same kind, example - model trained on paintings of airplanes predicting on real images of airplanes.

artificial intelligence, domain adaption, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2510.12075

Country: Oceania > Australia (0.15)

Genre: Research Report (0.64)

Industry: Transportation (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Model Collapse Demystified: The Case of Regression Elvis Dohmatob Y unzhen Feng Julia Kempe FAIR, Meta Center for Data Science, New York University

Neural Information Processing SystemsOct-10-2025, 02:47:39 GMT

The phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e. the model collapses.

model collapse, test error, theorem 4, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.40)
Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Add feedback

Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection

Zeng, Qirun, He, Eric, Hoffmann, Richard, Wang, Xuchuang, Zuo, Jinhang

arXiv.org Artificial IntelligenceJun-3-2025

Adversarial attacks on stochastic bandits have traditionally relied on some unrealistic assumptions, such as per-round reward manipulation and unbounded perturbations, limiting their relevance to real-world systems. We propose a more practical threat model, Fake Data Injection, which reflects realistic adversarial constraints: the attacker can inject only a limited number of bounded fake feedback samples into the learner's history, simulating legitimate interactions. We design efficient attack strategies under this model, explicitly addressing both magnitude constraints (on reward values) and temporal constraints (on when and how often data can be injected). Our theoretical analysis shows that these attacks can mislead both Upper Confidence Bound (UCB) and Thompson Sampling algorithms into selecting a target arm in nearly all rounds while incurring only sublinear attack cost. Experiments on synthetic and real-world datasets validate the effectiveness of our strategies, revealing significant vulnerabilities in widely used stochastic bandit algorithms under practical adversarial scenarios.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.21938

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)

Add feedback

Targeted Augmented Data for Audio Deepfake Detection

Astrid, Marcella, Ghorbel, Enjie, Aouada, Djamila

arXiv.org Artificial IntelligenceJul-10-2024

The availability of highly convincing audio deepfake generators highlights the need for designing robust audio deepfake detectors. Existing works often rely solely on real and fake data available in the training set, which may lead to overfitting, thereby reducing the robustness to unseen manipulations. To enhance the generalization capabilities of audio deepfake detectors, we propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities. Comprehensive experiments on two well-known architectures demonstrate that the proposed augmentation contributes to improving the generalization capabilities of these architectures.

augmentation, augmented data, detection, (14 more...)

arXiv.org Artificial Intelligence

2407.07598

Country: Africa > Middle East > Tunisia > Manouba Governorate > Manouba (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Model Collapse Demystified: The Case of Regression

Dohmatob, Elvis, Feng, Yunzhen, Kempe, Julia

arXiv.org Artificial IntelligenceFeb-12-2024

In the era of large language models like ChatGPT, the phenomenon of "model collapse" refers to the situation whereby as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the simplified setting of kernel regression and obtain results which show a clear crossover between where the model can cope with fake data, and a regime where the model's performance completely collapses. Under polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.

model collapse demystified, regression, test error, (13 more...)

arXiv.org Artificial Intelligence

2402.07712

Country:

North America > United States > New York (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative AI to Generate Test Data Generators

Baudry, Benoit, Etemadi, Khashayar, Fang, Sen, Gamage, Yogya, Liu, Yi, Liu, Yuxin, Monperrus, Martin, Ron, Javier, Silva, André, Tiwari, Deepika

arXiv.org Artificial IntelligenceJan-31-2024

Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.

faker, generator, test data, (15 more...)

arXiv.org Artificial Intelligence

2401.17626

Country:

Europe > Portugal > Lisbon > Lisbon (0.15)
North America > United States > Massachusetts > Suffolk County > Boston (0.05)
Europe > Sweden (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Strategic Data Augmentation with CTGAN for Smart Manufacturing: Enhancing Machine Learning Predictions of Paper Breaks in Pulp-and-Paper Production

Khosravi, Hamed, Farhadpour, Sarah, Grandhi, Manikanta, Raihan, Ahmed Shoyeb, Das, Srinjoy, Ahmed, Imtiaz

arXiv.org Artificial IntelligenceNov-15-2023

A significant challenge for predictive maintenance in the pulp-and-paper industry is the infrequency of paper breaks during the production process. In this article, operational data is analyzed from a paper manufacturing machine in which paper breaks are relatively rare but have a high economic impact. Utilizing a dataset comprising 18,398 instances derived from a quality assurance protocol, we address the scarcity of break events (124 cases) that pose a challenge for machine learning predictive models. With the help of Conditional Generative Adversarial Networks (CTGAN) and Synthetic Minority Oversampling Technique (SMOTE), we implement a novel data augmentation framework. This method ensures that the synthetic data mirrors the distribution of the real operational data but also seeks to enhance the performance metrics of predictive modeling. Before and after the data augmentation, we evaluate three different machine learning algorithms-Decision Trees (DT), Random Forest (RF), and Logistic Regression (LR). Utilizing the CTGAN-enhanced dataset, our study achieved significant improvements in predictive maintenance performance metrics. The efficacy of CTGAN in addressing data scarcity was evident, with the models' detection of machine breaks (Class 1) improving by over 30% for Decision Trees, 20% for Random Forest, and nearly 90% for Logistic Regression. With this methodological advancement, this study contributes to industrial quality control and maintenance scheduling by addressing rare event prediction in manufacturing processes.

class 1, dataset, real data, (12 more...)

arXiv.org Artificial Intelligence

2311.09333

Country: North America > United States > West Virginia > Monongalia County > Morgantown (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Materials > Paper & Forest Products > Paper Products (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
(2 more...)

Add feedback

CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

Lee, Chaejeong, Kim, Jayoung, Park, Noseong

arXiv.org Artificial IntelligenceSep-21-2023

With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.

co-evolving contrastive diffusion model, diffusion model, tabular data, (13 more...)

arXiv.org Artificial Intelligence

2304.12654

Country:

South America > Peru (0.04)
North America > Mexico (0.04)
South America > Colombia (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

ComGAN: Toward GANs Exploiting Multiple Samples

Lee, Haeone

arXiv.org Artificial IntelligenceApr-24-2023

In this paper, we propose ComGAN(ComparativeGAN) which allows the generator in GANs to refer to the semantics of comparative samples(e.g. real data) by comparison. ComGAN generalizes relativistic GANs by using arbitrary architecture and mostly outperforms relativistic GANs in simple input-concatenation architecture. To train the discriminator in ComGAN, we also propose equality regularization, which fits the discriminator to a neutral label for equally real or fake samples. Equality regularization highly boosts the performance of ComGAN including WGAN while being exceptionally simple compared to existing regularizations. Finally, we generalize comparative samples fixed to real data in relativistic GANs toward fake data and show that such objectives are sound in both theory and practice. Our experiments demonstrate superior performances of ComGAN and equality regularization, achieving the best FIDs in 7 out of 8 cases of different losses and data against ordinary GANs and relativistic GANs.

artificial intelligence, machine learning, regularization, (17 more...)

arXiv.org Artificial Intelligence

2304.12098

Country:

North America (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback